virtual drift
Drift Detection: Introducing Gaussian Split Detector
Fuccellaro, Maxime, Simon, Laurent, Zemmari, Akka
Recent research yielded a wide array of drift detectors. However, in order to achieve remarkable performance, the true class labels must be available during the drift detection phase. This paper targets at detecting drift when the ground truth is unknown during the detection phase. To that end, we introduce Gaussian Split Detector (GSD) a novel drift detector that works in batch mode. GSD is designed to work when the data follow a normal distribution and makes use of Gaussian mixture models to monitor changes in the decision boundary. The algorithm is designed to handle multi-dimension data streams and to work without the ground truth labels during the inference phase making it pertinent for real world use. In an extensive experimental study on real and synthetic datasets, we evaluate our detector against the state of the art. We show that our detector outperforms the state of the art in detecting real drift and in ignoring virtual drift which is key to avoid false alarms.
- North America > United States > California > Orange County > Irvine (0.04)
- Europe > Spain > Basque Country > Biscay Province > Bilbao (0.04)
Explaining Drift using Shapley Values
Edakunni, Narayanan U., Tekriwal, Utkarsh, Jain, Anukriti
Machine learning models often deteriorate in their performance when they are used to predict the outcomes over data on which they were not trained. These scenarios can often arise in real world when the distribution of data changes gradually or abruptly due to major events like a pandemic. There have been many attempts in machine learning research to come up with techniques that are resilient to such Concept drifts. However, there is no principled framework to identify the drivers behind the drift in model performance. In this paper, we propose a novel framework - DBShap that uses Shapley values to identify the main contributors of the drift and quantify their respective contributions. The proposed framework not only quantifies the importance of individual features in driving the drift but also includes the change in the underlying relation between the input and output as a possible driver. The explanation provided by DBShap can be used to understand the root cause behind the drift and use it to make the model resilient to the drift.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
On the Change of Decision Boundaries and Loss in Learning with Concept Drift
Hinder, Fabian, Vaquet, Valerie, Brinkrolf, Johannes, Hammer, Barbara
The world that surrounds us is subject to constant change, which also affects the increasing amount of data collected over time, in social media, sensor networks, IoT devices, etc. Those changes, referred to as concept drift, can be caused by seasonal changes, changing demands of individual customers, aging or failing sensors, and many more. As drift constitutes a major issue in many applications, considerable research is focusing on this setting [4]. Depending on the domain of data and application, different drift scenarios might occur: For example, covariate shift refers to the situation that training and test sets have different marginal distributions [9]. In recent years, a large variety of methods for learning in presence of drift has been proposed [4], whereby a majority of the approaches targets supervised learning scenarios.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > United States > Nebraska (0.04)
- (4 more...)
Task-Sensitive Concept Drift Detector with Constraint Embedding
Castellani, Andrea, Schmitt, Sebastian, Hammer, Barbara
Detecting drifts in data is essential for machine learning applications, as changes in the statistics of processed data typically has a profound influence on the performance of trained models. Most of the available drift detection methods are either supervised and require access to the true labels during inference time, or they are completely unsupervised and aim for changes in distributions without taking label information into account. We propose a novel task-sensitive semi-supervised drift detection scheme, which utilizes label information while training the initial model, but takes into account that supervised label information is no longer available when using the model during inference. It utilizes a constrained low-dimensional embedding representation of the input data. This way, it is best suited for the classification task. It is able to detect real drift, where the drift affects the classification performance, while it properly ignores virtual drift, where the classification performance is not affected by the drift. In the proposed framework, the actual method to detect a change in the statistics of incoming data samples can be chosen freely. Experimental evaluation on nine benchmarks datasets, with different types of drift, demonstrates that the proposed framework can reliably detect drifts, and outperforms state-of-the-art unsupervised drift detection approaches.
- Europe > Germany (0.04)
- North America > United States > New York (0.04)
Tackling Virtual and Real Concept Drifts: An Adaptive Gaussian Mixture Model
Oliveira, Gustavo, Minku, Leandro, Oliveira, Adriano
Abstract--Real-world applications have been dealing with large amounts of data that arrive over time and generally present changes in their underlying joint probability distribution, i.e., concept drift. Concept drift can be subdivided into two types: virtual drift, which affects the unconditional probability distribution p(x), and real drift, which affects the conditional probability distribution p(y x) . Existing works focuses on real drift. However, strategies to cope with real drift may not be the best suited for dealing with virtual drift, since the real class boundaries remain unchanged. We provide the first in depth analysis of the differences between the impact of virtual and real drifts on classifiers' suitability. We propose an approach to handle both drifts called On-line Gaussian Mixture Model With Noise Filter For Handling Virtual and Real Concept Drifts (OGMMF-VRD). Experiments with 7 synthetic and 3 real-world datasets show that OGMMF-VRD obtained the best results in terms of average accuracy, G-mean and runtime compared to existing approaches. Moreover, its accuracy over time suffered less performance degradation in the presence of drifts. In recent years, real-world applications like credit card learned decision boundaries, which need to be adjusted for fraud detection, flight delay and weather forecasting have the classifier to remain suitable. Such sequences of data are known as data stream learning approaches treat virtual drifts using data streams [2, 3]. They are challenging for data modeling the same strategies as for real drifts [6].
- South America > Brazil > Pernambuco > Recife (0.04)
- Europe > United Kingdom > England > West Midlands > Birmingham (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Concept Drift Detection and Adaptation with Weak Supervision on Streaming Unlabeled Data
Concept drift in learning and classification occurs when the statistical properties of either the data features or target change over time; evidence of drift has appeared in search data, medical research, malware, web data, and video. Drift adaptation has not yet been addressed in high dimensional, noisy, low-context data such as streaming text, video, or images due to the unique challenges these domains present. We present a two-fold approach to deal with concept drift in these domains: a density-based clustering approach to deal with virtual concept drift (change in statistical properties of features) and a weak-supervision step to deal with real concept drift (change in statistical properties of target). Our density-based clustering avoids problems posed by the curse of dimensionality to create an evolving 'map' of the live data space, thereby addressing virtual drift in features. Our weak-supervision step leverages high-confidence labels (oracle or heuristic labels) to generate weighted training sets to generalize and update existing deep learners to adapt to changing decision boundaries (real drift) and create new deep learners for unseen regions of the data space. Our results show that our two-fold approach performs well with >90% precision in 2018, four years after initial deployment in 2014, without any human intervention.
- Information Technology (0.48)
- Health & Medicine (0.48)